Leo el dataset e imprimo 10 muestras aleatorias.
import numpy as np
import pandas as pd
# Lectura de datos
df = pd.read_csv('data/star_class_clean.csv')
df.sample(10)
| alpha | delta | u | g | r | i | z | field_ID | spec_obj_ID | redshift | plate | MJD | fiber_ID | class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 44757 | 201.284987 | 6.578814 | 23.76517 | 20.27294 | 18.65475 | 18.05437 | 17.64629 | 25 | 5.448430e+18 | 0.304137 | 4839 | 55703 | 730 | GALAXY |
| 58791 | 113.304000 | 46.050640 | 24.80070 | 22.13559 | 20.94321 | 20.10069 | 19.73501 | 40 | 7.191263e+18 | 0.210746 | 6387 | 56328 | 511 | GALAXY |
| 78185 | 32.710297 | -0.501351 | 19.16892 | 17.24695 | 16.31170 | 15.90658 | 15.55801 | 84 | 4.560643e+17 | 0.104709 | 405 | 51816 | 272 | GALAXY |
| 68658 | 249.083519 | 36.329917 | 21.66848 | 19.47318 | 18.05269 | 17.53316 | 17.16106 | 261 | 5.849082e+18 | 0.252509 | 5195 | 56048 | 116 | GALAXY |
| 79629 | 359.374800 | -9.694830 | 18.45171 | 16.55736 | 15.59452 | 15.08675 | 14.65964 | 18 | 7.308790e+17 | 0.076888 | 649 | 52201 | 618 | GALAXY |
| 20876 | 195.502211 | 3.097263 | 18.30561 | 17.10295 | 16.44189 | 16.07475 | 15.80847 | 472 | 5.900769e+17 | 0.079512 | 524 | 52027 | 383 | GALAXY |
| 89736 | 13.583456 | 6.879307 | 23.63074 | 21.35514 | 19.35130 | 18.64652 | 18.28877 | 86 | 5.118443e+18 | 0.362428 | 4546 | 55835 | 370 | GALAXY |
| 32454 | 118.949174 | 29.533928 | 18.74032 | 17.23029 | 16.61152 | 16.38045 | 16.27901 | 76 | 3.564617e+18 | 0.000139 | 3166 | 54830 | 66 | STAR |
| 11646 | 16.787658 | 19.037154 | 22.54780 | 21.86388 | 21.77435 | 21.36245 | 21.40748 | 64 | 8.581796e+18 | 1.470737 | 7622 | 56987 | 680 | QSO |
| 65297 | 148.811198 | 39.607704 | 22.67321 | 21.53589 | 20.86117 | 20.81367 | 20.55704 | 164 | 9.934045e+18 | 0.000491 | 8823 | 57446 | 836 | STAR |
Codifico la variable objetivo 'class' para que la clase sean números:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['class'] = le.fit_transform(df['class'])
df.sample(10)
| alpha | delta | u | g | r | i | z | field_ID | spec_obj_ID | redshift | plate | MJD | fiber_ID | class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 126 | 335.508182 | 22.477694 | 22.79176 | 22.59233 | 21.50218 | 20.39903 | 19.68608 | 55 | 8.537752e+18 | 0.757253 | 7583 | 56958 | 193 | 0 |
| 81222 | 155.862418 | 9.253223 | 16.95792 | 15.58199 | 15.02629 | 14.79044 | 14.70758 | 239 | 3.213424e+18 | 0.000063 | 2854 | 54480 | 385 | 2 |
| 54418 | 20.042711 | 6.973735 | 24.66912 | 22.63787 | 22.03419 | 21.10190 | 19.82852 | 148 | 9.864203e+18 | -0.000149 | 8761 | 58430 | 706 | 2 |
| 77104 | 248.950134 | 31.354189 | 18.58170 | 17.64611 | 17.37666 | 17.09004 | 17.02099 | 341 | 1.508762e+18 | 0.081640 | 1340 | 52781 | 204 | 0 |
| 1794 | 226.034918 | 60.411183 | 22.44654 | 22.34569 | 20.86987 | 19.93842 | 19.54952 | 65 | 7.860054e+18 | 0.464973 | 6981 | 56443 | 535 | 0 |
| 34444 | 357.445402 | 2.889376 | 24.59534 | 21.55456 | 21.01167 | 20.67867 | 20.41307 | 312 | 4.816750e+18 | -0.000565 | 4278 | 55505 | 545 | 2 |
| 78478 | 217.445409 | 19.022253 | 25.64668 | 21.56526 | 20.13074 | 19.32818 | 18.99194 | 667 | 6.639451e+18 | 0.511254 | 5897 | 56042 | 69 | 0 |
| 75370 | 169.241787 | 53.336896 | 22.93631 | 21.54308 | 20.09273 | 19.23668 | 18.90856 | 140 | 7.541419e+18 | 0.468229 | 6698 | 56637 | 515 | 0 |
| 31594 | 162.246200 | 3.018694 | 22.88539 | 22.82034 | 21.06431 | 19.81143 | 19.36104 | 83 | 5.329119e+18 | 0.558515 | 4733 | 55649 | 853 | 0 |
| 99445 | 3.752676 | -0.418159 | 21.29472 | 19.95352 | 19.47857 | 19.16026 | 19.00937 | 141 | 7.735444e+17 | 0.157588 | 687 | 52518 | 186 | 0 |
Normalizo las variables dependientes "X", no el target "y" (class):
from sklearn.preprocessing import MinMaxScaler
target_col = 'class'
features_cols = df.loc[:, df.columns != target_col].columns.tolist()
# Normalizamos las variables, no el target
min_max_scaler = MinMaxScaler()
df[features_cols] = min_max_scaler.fit_transform(df[features_cols])
df.sample(10)
| alpha | delta | u | g | r | i | z | field_ID | spec_obj_ID | redshift | plate | MJD | fiber_ID | class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11735 | 0.410967 | 0.798835 | 0.998367 | 0.998382 | 0.264240 | 0.239800 | 0.998554 | 0.051125 | 0.174015 | 0.001376 | 0.174009 | 0.298607 | 0.589590 | 2 |
| 49353 | 0.528811 | 0.597111 | 0.998677 | 0.998776 | 0.469091 | 0.428781 | 0.998974 | 0.195297 | 0.661105 | 0.145472 | 0.661103 | 0.805980 | 0.801802 | 1 |
| 47107 | 0.978346 | 0.471536 | 0.998889 | 0.998870 | 0.503068 | 0.447633 | 0.999021 | 0.194274 | 0.610769 | 0.001420 | 0.610781 | 0.879437 | 0.037037 | 0 |
| 75618 | 0.357929 | 0.640239 | 0.998743 | 0.998657 | 0.368360 | 0.315246 | 0.998692 | 0.062372 | 0.023044 | 0.019226 | 0.023044 | 0.050928 | 0.141141 | 0 |
| 36065 | 0.604141 | 0.494424 | 0.999281 | 0.999042 | 0.541385 | 0.447672 | 0.998993 | 0.632924 | 0.293229 | 0.080176 | 0.293217 | 0.552157 | 0.950951 | 0 |
| 15682 | 0.948018 | 0.201403 | 0.998960 | 0.999118 | 0.621398 | 0.520514 | 0.999186 | 0.197342 | 0.728034 | 0.119130 | 0.728035 | 0.827280 | 0.672673 | 0 |
| 32387 | 0.628952 | 0.562192 | 0.999145 | 0.999009 | 0.540322 | 0.468705 | 0.999036 | 0.223926 | 0.853267 | 0.001419 | 0.853269 | 0.910568 | 0.729730 | 2 |
| 20963 | 0.080031 | 0.341169 | 0.999238 | 0.999048 | 0.552207 | 0.449331 | 0.998978 | 0.413088 | 0.395500 | 0.070704 | 0.395489 | 0.577963 | 1.000000 | 0 |
| 23416 | 0.345697 | 0.226833 | 0.999015 | 0.999154 | 0.555309 | 0.452865 | 0.998992 | 0.029652 | 0.366168 | 0.084430 | 0.366175 | 0.581786 | 0.039039 | 0 |
| 33857 | 0.481611 | 0.695685 | 0.998909 | 0.999018 | 0.541568 | 0.443660 | 0.998972 | 0.370143 | 0.523082 | 0.177991 | 0.523084 | 0.656472 | 0.464464 | 1 |
División de datos en entrenamiento (80%) y test (20%):
from sklearn.model_selection import train_test_split
X = df[features_cols]
y = df[target_col]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
print("División de datos en entrenamiento y test realizado correctamente")
División de datos en entrenamiento y test realizado correctamente
Vamos a llevar a cabo una optimización LightGBM (basado en árboles de decisión) mediante las distintas métricas de evaluación posibles: accuracy, precision o recall.
Como el dataset con el que estamos trabajando tiene un conjunto dedatos desbalanceado, la métrica que he escogido para llevar a cabo la optimización es el recall, ya que el uso de otras métricas podría afectar negativamente al rendimiento del modelo.
Después de realizar pruebas con distintos parámetros y con sus respectivos valores, he llegado a la conclusión de generar un modelo con las siguientes características y parámetros:
import numpy as np
import time
import optuna
import lightgbm as lgbm
from sklearn.metrics import recall_score
# Definimos los parámetros comúnes
param_common = {"boosting_type": 'gbdt',
"objective": 'multiclass',
"class_weight": "balanced",
"random_state": 0,
}
# Función que entrena y evalua un modelo LGBM
def objective(trial, X_train, y_train, X_test, y_test):
# Definimos la red de hiperparámetros a evaluar y su respectivo rango de valores
param_grid = {"n_estimators": trial.suggest_int("n_estimators", 150, 200),
"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.04),
"num_leaves": trial.suggest_int("num_leaves", 2000, 2500, step=20),
"max_depth": trial.suggest_int("max_depth", 8, 15),
"bagging_fraction": trial.suggest_float("bagging_fraction", 0.75, 0.90),
"feature_fraction": trial.suggest_float("feature_fraction", 0.65, 0.75),
"bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
"min_child_samples": trial.suggest_int("min_child_samples", 50, 80)
}
# Instanciamos el modelo pasándole los hiperparámetros
lgbm_class = lgbm.LGBMClassifier(**param_common, **param_grid)
# Entrenamos el modelo con los datos de entrenamiento
lgbm_class.fit(X_train, y_train)
# Evaluamos el modelo con la métrica "recall" y devolvemos su valor
return recall_score(y_true=y_test, y_pred=lgbm_class.predict(X_test), average='macro')
print("Modelo entrenado y evaluado correctamente!")
Modelo entrenado y evaluado correctamente!
Creamos un estudio con 50 intentos para encontrar el modelo que mayor recall tenga y obtendremos los hiperparámetros que mejor modelo generen:
import optuna
import time
trial = []
study = optuna.create_study(direction="maximize",
study_name="Star Class - LightGBM Classifier",
sampler=optuna.samplers.TPESampler(seed=0),
pruner=optuna.pruners.HyperbandPruner())
func = lambda trial: objective(trial=trial, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test)
start_time = time.time()
study.optimize(func, n_trials=50)
end_time = time.time()
elapsed_time = end_time - start_time
# Guardamos el tiempo para poder leerlo después
elapsed_time_df = pd.DataFrame({"Estudio": ["Star Class - LightGBM Classifier"], "Tiempo (s)": [elapsed_time]})
[I 2024-04-02 19:29:12,232] A new study created in memory with name: Star Class - LightGBM Classifier
[LightGBM] [Warning] bagging_fraction is set=0.8135482199008357, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8135482199008357 [LightGBM] [Warning] feature_fraction is set=0.7145894113066656, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7145894113066656 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:29:28,055] Trial 0 finished with value: 0.9760803609814813 and parameters: {'n_estimators': 177, 'learning_rate': 0.03145568099117258, 'num_leaves': 2300, 'max_depth': 12, 'bagging_fraction': 0.8135482199008357, 'feature_fraction': 0.7145894113066656, 'bagging_freq': 4, 'min_child_samples': 77}. Best is trial 0 with value: 0.9760803609814813.
[LightGBM] [Warning] bagging_fraction is set=0.8352066841640898, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8352066841640898 [LightGBM] [Warning] feature_fraction is set=0.7425596638292661, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7425596638292661 [LightGBM] [Warning] bagging_freq is set=1, subsample_freq=0 will be ignored. Current value: bagging_freq=1
[I 2024-04-02 19:29:38,961] Trial 1 finished with value: 0.9761084880031111 and parameters: {'n_estimators': 199, 'learning_rate': 0.02150324556477333, 'num_leaves': 2400, 'max_depth': 12, 'bagging_fraction': 0.8352066841640898, 'feature_fraction': 0.7425596638292661, 'bagging_freq': 1, 'min_child_samples': 52}. Best is trial 1 with value: 0.9761084880031111.
[LightGBM] [Warning] bagging_fraction is set=0.8967927513349147, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8967927513349147 [LightGBM] [Warning] feature_fraction is set=0.7299158564216723, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7299158564216723 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:29:48,542] Trial 2 finished with value: 0.9762747124667918 and parameters: {'n_estimators': 151, 'learning_rate': 0.03497859536643814, 'num_leaves': 2400, 'max_depth': 14, 'bagging_fraction': 0.8967927513349147, 'feature_fraction': 0.7299158564216723, 'bagging_freq': 4, 'min_child_samples': 74}. Best is trial 2 with value: 0.9762747124667918.
[LightGBM] [Warning] bagging_fraction is set=0.8282772482625107, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8282772482625107 [LightGBM] [Warning] feature_fraction is set=0.6914661939990524, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6914661939990524 [LightGBM] [Warning] bagging_freq is set=2, subsample_freq=0 will be ignored. Current value: bagging_freq=2
[I 2024-04-02 19:30:05,367] Trial 3 finished with value: 0.9760573092920467 and parameters: {'n_estimators': 156, 'learning_rate': 0.029197630639825715, 'num_leaves': 2060, 'max_depth': 15, 'bagging_fraction': 0.8282772482625107, 'feature_fraction': 0.6914661939990524, 'bagging_freq': 2, 'min_child_samples': 74}. Best is trial 2 with value: 0.9762747124667918.
[LightGBM] [Warning] bagging_fraction is set=0.8418143584083633, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8418143584083633 [LightGBM] [Warning] feature_fraction is set=0.7116933996874757, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7116933996874757 [LightGBM] [Warning] bagging_freq is set=7, subsample_freq=0 will be ignored. Current value: bagging_freq=7
[I 2024-04-02 19:30:13,541] Trial 4 finished with value: 0.9761161010014036 and parameters: {'n_estimators': 173, 'learning_rate': 0.027053018466059453, 'num_leaves': 2000, 'max_depth': 12, 'bagging_fraction': 0.8418143584083633, 'feature_fraction': 0.7116933996874757, 'bagging_freq': 7, 'min_child_samples': 71}. Best is trial 2 with value: 0.9762747124667918.
[LightGBM] [Warning] bagging_fraction is set=0.8500150073168502, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8500150073168502 [LightGBM] [Warning] feature_fraction is set=0.7170637869618159, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7170637869618159 [LightGBM] [Warning] bagging_freq is set=2, subsample_freq=0 will be ignored. Current value: bagging_freq=2
[I 2024-04-02 19:30:16,283] Trial 5 finished with value: 0.975965102534308 and parameters: {'n_estimators': 168, 'learning_rate': 0.023110958613980243, 'num_leaves': 2360, 'max_depth': 8, 'bagging_fraction': 0.8500150073168502, 'feature_fraction': 0.7170637869618159, 'bagging_freq': 2, 'min_child_samples': 53}. Best is trial 2 with value: 0.9762747124667918.
[LightGBM] [Warning] bagging_fraction is set=0.898256075708884, subsample=1.0 will be ignored. Current value: bagging_fraction=0.898256075708884 [LightGBM] [Warning] feature_fraction is set=0.6602044810748028, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6602044810748028 [LightGBM] [Warning] bagging_freq is set=2, subsample_freq=0 will be ignored. Current value: bagging_freq=2
[I 2024-04-02 19:30:27,599] Trial 6 finished with value: 0.9761287893318911 and parameters: {'n_estimators': 166, 'learning_rate': 0.02091132312827868, 'num_leaves': 2280, 'max_depth': 11, 'bagging_fraction': 0.898256075708884, 'feature_fraction': 0.6602044810748028, 'bagging_freq': 2, 'min_child_samples': 55}. Best is trial 2 with value: 0.9762747124667918.
[LightGBM] [Warning] bagging_fraction is set=0.773845437546828, subsample=1.0 will be ignored. Current value: bagging_fraction=0.773845437546828 [LightGBM] [Warning] feature_fraction is set=0.6610375141164305, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6610375141164305 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:30:32,216] Trial 7 finished with value: 0.9755637112332 and parameters: {'n_estimators': 183, 'learning_rate': 0.017598748076193462, 'num_leaves': 2240, 'max_depth': 9, 'bagging_fraction': 0.773845437546828, 'feature_fraction': 0.6610375141164305, 'bagging_freq': 5, 'min_child_samples': 54}. Best is trial 2 with value: 0.9762747124667918.
[LightGBM] [Warning] bagging_fraction is set=0.8756917361248207, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8756917361248207 [LightGBM] [Warning] feature_fraction is set=0.6596098407893963, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6596098407893963 [LightGBM] [Warning] bagging_freq is set=7, subsample_freq=0 will be ignored. Current value: bagging_freq=7
[I 2024-04-02 19:30:35,942] Trial 8 finished with value: 0.9756531676302839 and parameters: {'n_estimators': 160, 'learning_rate': 0.021061755119828923, 'num_leaves': 2420, 'max_depth': 8, 'bagging_fraction': 0.8756917361248207, 'feature_fraction': 0.6596098407893963, 'bagging_freq': 7, 'min_child_samples': 64}. Best is trial 2 with value: 0.9762747124667918.
[LightGBM] [Warning] bagging_fraction is set=0.7924210443864614, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7924210443864614 [LightGBM] [Warning] feature_fraction is set=0.6620196561213169, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6620196561213169 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:30:39,416] Trial 9 finished with value: 0.9764668133350809 and parameters: {'n_estimators': 199, 'learning_rate': 0.028145365592351382, 'num_leaves': 2380, 'max_depth': 8, 'bagging_fraction': 0.7924210443864614, 'feature_fraction': 0.6620196561213169, 'bagging_freq': 3, 'min_child_samples': 53}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7541584039871952, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7541584039871952 [LightGBM] [Warning] feature_fraction is set=0.6859674378029715, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6859674378029715 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:30:44,170] Trial 10 finished with value: 0.9758057042762097 and parameters: {'n_estimators': 199, 'learning_rate': 0.03905669846721026, 'num_leaves': 2160, 'max_depth': 10, 'bagging_fraction': 0.7541584039871952, 'feature_fraction': 0.6859674378029715, 'bagging_freq': 5, 'min_child_samples': 63}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8001205254707947, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8001205254707947 [LightGBM] [Warning] feature_fraction is set=0.7478322159576245, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7478322159576245 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:30:57,639] Trial 11 finished with value: 0.9764485499287647 and parameters: {'n_estimators': 150, 'learning_rate': 0.0352864266897114, 'num_leaves': 2480, 'max_depth': 15, 'bagging_fraction': 0.8001205254707947, 'feature_fraction': 0.7478322159576245, 'bagging_freq': 4, 'min_child_samples': 80}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7964191795108808, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7964191795108808 [LightGBM] [Warning] feature_fraction is set=0.7466777946335997, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7466777946335997 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:31:16,970] Trial 12 finished with value: 0.9754983060128181 and parameters: {'n_estimators': 189, 'learning_rate': 0.010167904842887415, 'num_leaves': 2480, 'max_depth': 14, 'bagging_fraction': 0.7964191795108808, 'feature_fraction': 0.7466777946335997, 'bagging_freq': 3, 'min_child_samples': 59}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7987167646270676, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7987167646270676 [LightGBM] [Warning] feature_fraction is set=0.6796483194554959, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6796483194554959 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:31:26,478] Trial 13 finished with value: 0.9758297554529118 and parameters: {'n_estimators': 190, 'learning_rate': 0.03370478672604048, 'num_leaves': 2500, 'max_depth': 10, 'bagging_fraction': 0.7987167646270676, 'feature_fraction': 0.6796483194554959, 'bagging_freq': 5, 'min_child_samples': 80}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.78446753862147, subsample=1.0 will be ignored. Current value: bagging_fraction=0.78446753862147 [LightGBM] [Warning] feature_fraction is set=0.7002408648109376, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7002408648109376 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:31:40,224] Trial 14 finished with value: 0.976005843531906 and parameters: {'n_estimators': 182, 'learning_rate': 0.03937338123780096, 'num_leaves': 2460, 'max_depth': 14, 'bagging_fraction': 0.78446753862147, 'feature_fraction': 0.7002408648109376, 'bagging_freq': 3, 'min_child_samples': 67}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8095242372898908, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8095242372898908 [LightGBM] [Warning] feature_fraction is set=0.6728945129256082, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6728945129256082 [LightGBM] [Warning] bagging_freq is set=6, subsample_freq=0 will be ignored. Current value: bagging_freq=6
[I 2024-04-02 19:31:59,886] Trial 15 finished with value: 0.9762311467540198 and parameters: {'n_estimators': 151, 'learning_rate': 0.02802991074851123, 'num_leaves': 2340, 'max_depth': 13, 'bagging_fraction': 0.8095242372898908, 'feature_fraction': 0.6728945129256082, 'bagging_freq': 6, 'min_child_samples': 58}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7722284593356672, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7722284593356672 [LightGBM] [Warning] feature_fraction is set=0.7329864603243553, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7329864603243553 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:32:23,695] Trial 16 finished with value: 0.9757572759258002 and parameters: {'n_estimators': 192, 'learning_rate': 0.034104162008933625, 'num_leaves': 2220, 'max_depth': 15, 'bagging_fraction': 0.7722284593356672, 'feature_fraction': 0.7329864603243553, 'bagging_freq': 3, 'min_child_samples': 68}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8146949342129678, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8146949342129678 [LightGBM] [Warning] feature_fraction is set=0.6523150947542089, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6523150947542089 [LightGBM] [Warning] bagging_freq is set=1, subsample_freq=0 will be ignored. Current value: bagging_freq=1
[I 2024-04-02 19:32:31,276] Trial 17 finished with value: 0.9761543786874233 and parameters: {'n_estimators': 164, 'learning_rate': 0.03072715950282309, 'num_leaves': 2440, 'max_depth': 10, 'bagging_fraction': 0.8146949342129678, 'feature_fraction': 0.6523150947542089, 'bagging_freq': 1, 'min_child_samples': 60}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7503025184092167, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7503025184092167 [LightGBM] [Warning] feature_fraction is set=0.6991155143721897, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6991155143721897 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:32:39,360] Trial 18 finished with value: 0.9760725352886316 and parameters: {'n_estimators': 172, 'learning_rate': 0.02687013023054441, 'num_leaves': 2340, 'max_depth': 9, 'bagging_fraction': 0.7503025184092167, 'feature_fraction': 0.6991155143721897, 'bagging_freq': 4, 'min_child_samples': 50}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7949920984387248, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7949920984387248 [LightGBM] [Warning] feature_fraction is set=0.6728375298672907, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.6728375298672907 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:32:59,848] Trial 19 finished with value: 0.9757577756694338 and parameters: {'n_estimators': 179, 'learning_rate': 0.037710525646651366, 'num_leaves': 2160, 'max_depth': 13, 'bagging_fraction': 0.7949920984387248, 'feature_fraction': 0.6728375298672907, 'bagging_freq': 4, 'min_child_samples': 70}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7790544564238275, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7790544564238275 [LightGBM] [Warning] feature_fraction is set=0.7264753149893807, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7264753149893807 [LightGBM] [Warning] bagging_freq is set=6, subsample_freq=0 will be ignored. Current value: bagging_freq=6
[I 2024-04-02 19:33:08,133] Trial 20 finished with value: 0.9762004820662926 and parameters: {'n_estimators': 157, 'learning_rate': 0.03222616287444538, 'num_leaves': 2500, 'max_depth': 11, 'bagging_fraction': 0.7790544564238275, 'feature_fraction': 0.7264753149893807, 'bagging_freq': 6, 'min_child_samples': 80}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.869051675650112, subsample=1.0 will be ignored. Current value: bagging_fraction=0.869051675650112 [LightGBM] [Warning] feature_fraction is set=0.7485902560299506, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7485902560299506 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:33:25,938] Trial 21 finished with value: 0.9761974446565614 and parameters: {'n_estimators': 150, 'learning_rate': 0.0366355075214305, 'num_leaves': 2380, 'max_depth': 14, 'bagging_fraction': 0.869051675650112, 'feature_fraction': 0.7485902560299506, 'bagging_freq': 3, 'min_child_samples': 75}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8243755969459841, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8243755969459841 [LightGBM] [Warning] feature_fraction is set=0.7382712602921161, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7382712602921161 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:33:40,405] Trial 22 finished with value: 0.9762772501328891 and parameters: {'n_estimators': 153, 'learning_rate': 0.03527939338123826, 'num_leaves': 2420, 'max_depth': 15, 'bagging_fraction': 0.8243755969459841, 'feature_fraction': 0.7382712602921161, 'bagging_freq': 4, 'min_child_samples': 77}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8210336886226735, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8210336886226735 [LightGBM] [Warning] feature_fraction is set=0.738389852205777, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.738389852205777 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:34:04,767] Trial 23 finished with value: 0.9755302218302989 and parameters: {'n_estimators': 160, 'learning_rate': 0.03561669039589098, 'num_leaves': 2460, 'max_depth': 15, 'bagging_fraction': 0.8210336886226735, 'feature_fraction': 0.738389852205777, 'bagging_freq': 5, 'min_child_samples': 77}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8062373764531802, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8062373764531802 [LightGBM] [Warning] feature_fraction is set=0.7352756530794984, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7352756530794984 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:34:11,875] Trial 24 finished with value: 0.9761290020264483 and parameters: {'n_estimators': 156, 'learning_rate': 0.03686942776491179, 'num_leaves': 2300, 'max_depth': 13, 'bagging_fraction': 0.8062373764531802, 'feature_fraction': 0.7352756530794984, 'bagging_freq': 3, 'min_child_samples': 78}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8265054963162367, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8265054963162367 [LightGBM] [Warning] feature_fraction is set=0.7425745215466245, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7425745215466245 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:34:22,971] Trial 25 finished with value: 0.9761259646167172 and parameters: {'n_estimators': 194, 'learning_rate': 0.03305699148241431, 'num_leaves': 2440, 'max_depth': 15, 'bagging_fraction': 0.8265054963162367, 'feature_fraction': 0.7425745215466245, 'bagging_freq': 4, 'min_child_samples': 72}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7909128912409519, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7909128912409519 [LightGBM] [Warning] feature_fraction is set=0.7246948392936614, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7246948392936614 [LightGBM] [Warning] bagging_freq is set=2, subsample_freq=0 will be ignored. Current value: bagging_freq=2
[I 2024-04-02 19:34:26,269] Trial 26 finished with value: 0.9762106327306825 and parameters: {'n_estimators': 185, 'learning_rate': 0.030181903708121755, 'num_leaves': 2360, 'max_depth': 9, 'bagging_fraction': 0.7909128912409519, 'feature_fraction': 0.7246948392936614, 'bagging_freq': 2, 'min_child_samples': 62}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8055835388650705, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8055835388650705 [LightGBM] [Warning] feature_fraction is set=0.748138931618013, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.748138931618013 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:34:34,419] Trial 27 finished with value: 0.9760900119022377 and parameters: {'n_estimators': 170, 'learning_rate': 0.032449484322320525, 'num_leaves': 2420, 'max_depth': 13, 'bagging_fraction': 0.8055835388650705, 'feature_fraction': 0.748138931618013, 'bagging_freq': 3, 'min_child_samples': 67}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8174510519325244, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8174510519325244 [LightGBM] [Warning] feature_fraction is set=0.7245004823941962, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7245004823941962 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:34:43,496] Trial 28 finished with value: 0.9757010218825407 and parameters: {'n_estimators': 162, 'learning_rate': 0.03996497961051686, 'num_leaves': 2200, 'max_depth': 14, 'bagging_fraction': 0.8174510519325244, 'feature_fraction': 0.7245004823941962, 'bagging_freq': 5, 'min_child_samples': 56}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.8114581814808931, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8114581814808931 [LightGBM] [Warning] feature_fraction is set=0.7384472853980811, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7384472853980811 [LightGBM] [Warning] bagging_freq is set=6, subsample_freq=0 will be ignored. Current value: bagging_freq=6
[I 2024-04-02 19:34:49,125] Trial 29 finished with value: 0.9759729282271578 and parameters: {'n_estimators': 177, 'learning_rate': 0.031006187426339683, 'num_leaves': 2320, 'max_depth': 11, 'bagging_fraction': 0.8114581814808931, 'feature_fraction': 0.7384472853980811, 'bagging_freq': 6, 'min_child_samples': 77}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.7854419943626865, subsample=1.0 will be ignored. Current value: bagging_fraction=0.7854419943626865 [LightGBM] [Warning] feature_fraction is set=0.7499076914330858, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7499076914330858 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:34:57,177] Trial 30 finished with value: 0.9756483049926463 and parameters: {'n_estimators': 153, 'learning_rate': 0.03485206790028374, 'num_leaves': 2280, 'max_depth': 15, 'bagging_fraction': 0.7854419943626865, 'feature_fraction': 0.7499076914330858, 'bagging_freq': 4, 'min_child_samples': 79}. Best is trial 9 with value: 0.9764668133350809.
[LightGBM] [Warning] bagging_fraction is set=0.848786709125654, subsample=1.0 will be ignored. Current value: bagging_fraction=0.848786709125654 [LightGBM] [Warning] feature_fraction is set=0.7303621477118888, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7303621477118888 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:35:05,259] Trial 31 finished with value: 0.9765476572466051 and parameters: {'n_estimators': 153, 'learning_rate': 0.034696496881949423, 'num_leaves': 2400, 'max_depth': 14, 'bagging_fraction': 0.848786709125654, 'feature_fraction': 0.7303621477118888, 'bagging_freq': 4, 'min_child_samples': 75}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8319865556397665, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8319865556397665 [LightGBM] [Warning] feature_fraction is set=0.7408452786323428, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7408452786323428 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:35:13,755] Trial 32 finished with value: 0.9758778967542451 and parameters: {'n_estimators': 154, 'learning_rate': 0.037475605202483844, 'num_leaves': 2380, 'max_depth': 15, 'bagging_fraction': 0.8319865556397665, 'feature_fraction': 0.7408452786323428, 'bagging_freq': 4, 'min_child_samples': 75}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8384740184557448, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8384740184557448 [LightGBM] [Warning] feature_fraction is set=0.735115377589215, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.735115377589215 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:35:22,266] Trial 33 finished with value: 0.9761201768463312 and parameters: {'n_estimators': 159, 'learning_rate': 0.03516891726133278, 'num_leaves': 2400, 'max_depth': 14, 'bagging_fraction': 0.8384740184557448, 'feature_fraction': 0.735115377589215, 'bagging_freq': 4, 'min_child_samples': 73}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8216469554404122, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8216469554404122 [LightGBM] [Warning] feature_fraction is set=0.7304014133591633, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7304014133591633 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:35:28,510] Trial 34 finished with value: 0.9762030197323902 and parameters: {'n_estimators': 153, 'learning_rate': 0.032828216797307824, 'num_leaves': 2480, 'max_depth': 12, 'bagging_fraction': 0.8216469554404122, 'feature_fraction': 0.7304014133591633, 'bagging_freq': 3, 'min_child_samples': 76}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8446424530963995, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8446424530963995 [LightGBM] [Warning] feature_fraction is set=0.7180659710201744, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7180659710201744 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:35:38,656] Trial 35 finished with value: 0.9758272177868142 and parameters: {'n_estimators': 150, 'learning_rate': 0.03153611086859189, 'num_leaves': 2420, 'max_depth': 14, 'bagging_fraction': 0.8446424530963995, 'feature_fraction': 0.7180659710201744, 'bagging_freq': 4, 'min_child_samples': 78}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8311883996502487, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8311883996502487 [LightGBM] [Warning] feature_fraction is set=0.7428543901551529, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7428543901551529 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:35:47,666] Trial 36 finished with value: 0.9760803609814813 and parameters: {'n_estimators': 156, 'learning_rate': 0.029542583002633793, 'num_leaves': 2460, 'max_depth': 15, 'bagging_fraction': 0.8311883996502487, 'feature_fraction': 0.7428543901551529, 'bagging_freq': 5, 'min_child_samples': 70}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8517940874452554, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8517940874452554 [LightGBM] [Warning] feature_fraction is set=0.7103287839043507, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7103287839043507 [LightGBM] [Warning] bagging_freq is set=2, subsample_freq=0 will be ignored. Current value: bagging_freq=2
[I 2024-04-02 19:36:11,783] Trial 37 finished with value: 0.9757296486478042 and parameters: {'n_estimators': 200, 'learning_rate': 0.036055133949401005, 'num_leaves': 2380, 'max_depth': 13, 'bagging_fraction': 0.8517940874452554, 'feature_fraction': 0.7103287839043507, 'bagging_freq': 2, 'min_child_samples': 73}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8018237946847233, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8018237946847233 [LightGBM] [Warning] feature_fraction is set=0.7321973321035629, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7321973321035629 [LightGBM] [Warning] bagging_freq is set=1, subsample_freq=0 will be ignored. Current value: bagging_freq=1
[I 2024-04-02 19:36:24,172] Trial 38 finished with value: 0.9760466588840231 and parameters: {'n_estimators': 196, 'learning_rate': 0.03430770970181614, 'num_leaves': 2100, 'max_depth': 12, 'bagging_fraction': 0.8018237946847233, 'feature_fraction': 0.7321973321035629, 'bagging_freq': 1, 'min_child_samples': 51}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8161183106057058, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8161183106057058 [LightGBM] [Warning] feature_fraction is set=0.743331174861854, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.743331174861854 [LightGBM] [Warning] bagging_freq is set=2, subsample_freq=0 will be ignored. Current value: bagging_freq=2
[I 2024-04-02 19:36:41,485] Trial 39 finished with value: 0.9759749661496215 and parameters: {'n_estimators': 164, 'learning_rate': 0.027790236906342546, 'num_leaves': 2320, 'max_depth': 15, 'bagging_fraction': 0.8161183106057058, 'feature_fraction': 0.743331174861854, 'bagging_freq': 2, 'min_child_samples': 75}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.827499579732742, subsample=1.0 will be ignored. Current value: bagging_fraction=0.827499579732742 [LightGBM] [Warning] feature_fraction is set=0.7229783559983337, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7229783559983337 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:36:49,633] Trial 40 finished with value: 0.9759853295085689 and parameters: {'n_estimators': 168, 'learning_rate': 0.03828377634749398, 'num_leaves': 2420, 'max_depth': 14, 'bagging_fraction': 0.827499579732742, 'feature_fraction': 0.7229783559983337, 'bagging_freq': 4, 'min_child_samples': 80}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8967742389026684, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8967742389026684 [LightGBM] [Warning] feature_fraction is set=0.7300688425010876, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7300688425010876 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:36:52,592] Trial 41 finished with value: 0.9762489104167021 and parameters: {'n_estimators': 153, 'learning_rate': 0.03596459688171224, 'num_leaves': 2400, 'max_depth': 8, 'bagging_fraction': 0.8967742389026684, 'feature_fraction': 0.7300688425010876, 'bagging_freq': 3, 'min_child_samples': 74}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8560814727043262, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8560814727043262 [LightGBM] [Warning] feature_fraction is set=0.7374315844752108, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7374315844752108 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:37:07,791] Trial 42 finished with value: 0.976289438719743 and parameters: {'n_estimators': 157, 'learning_rate': 0.033975383953251344, 'num_leaves': 2360, 'max_depth': 14, 'bagging_fraction': 0.8560814727043262, 'feature_fraction': 0.7374315844752108, 'bagging_freq': 4, 'min_child_samples': 78}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8558951091534237, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8558951091534237 [LightGBM] [Warning] feature_fraction is set=0.7373116452634333, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7373116452634333 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:37:28,348] Trial 43 finished with value: 0.9759678528949628 and parameters: {'n_estimators': 155, 'learning_rate': 0.03368845879646922, 'num_leaves': 2340, 'max_depth': 14, 'bagging_fraction': 0.8558951091534237, 'feature_fraction': 0.7373116452634333, 'bagging_freq': 5, 'min_child_samples': 77}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8406544216186853, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8406544216186853 [LightGBM] [Warning] feature_fraction is set=0.7445511207407088, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7445511207407088 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:37:39,093] Trial 44 finished with value: 0.9760593472145106 and parameters: {'n_estimators': 158, 'learning_rate': 0.03164619944824225, 'num_leaves': 2440, 'max_depth': 15, 'bagging_fraction': 0.8406544216186853, 'feature_fraction': 0.7445511207407088, 'bagging_freq': 4, 'min_child_samples': 79}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8613806392833563, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8613806392833563 [LightGBM] [Warning] feature_fraction is set=0.7379478700771334, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7379478700771334 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:37:46,984] Trial 45 finished with value: 0.9756120652290905 and parameters: {'n_estimators': 162, 'learning_rate': 0.03821984825420811, 'num_leaves': 2360, 'max_depth': 14, 'bagging_fraction': 0.8613806392833563, 'feature_fraction': 0.7379478700771334, 'bagging_freq': 4, 'min_child_samples': 78}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8365542565057527, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8365542565057527 [LightGBM] [Warning] feature_fraction is set=0.728243338772028, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.728243338772028 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:37:58,945] Trial 46 finished with value: 0.9757735014096524 and parameters: {'n_estimators': 150, 'learning_rate': 0.029299693668258478, 'num_leaves': 2500, 'max_depth': 13, 'bagging_fraction': 0.8365542565057527, 'feature_fraction': 0.728243338772028, 'bagging_freq': 3, 'min_child_samples': 76}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8458674664025277, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8458674664025277 [LightGBM] [Warning] feature_fraction is set=0.7332339101339279, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7332339101339279 [LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[I 2024-04-02 19:38:08,100] Trial 47 finished with value: 0.9760159941962959 and parameters: {'n_estimators': 153, 'learning_rate': 0.03482290558419958, 'num_leaves': 2260, 'max_depth': 15, 'bagging_fraction': 0.8458674664025277, 'feature_fraction': 0.7332339101339279, 'bagging_freq': 5, 'min_child_samples': 65}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8345727386650562, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8345727386650562 [LightGBM] [Warning] feature_fraction is set=0.7467901594808177, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7467901594808177 [LightGBM] [Warning] bagging_freq is set=4, subsample_freq=0 will be ignored. Current value: bagging_freq=4
[I 2024-04-02 19:38:19,592] Trial 48 finished with value: 0.9756351169185251 and parameters: {'n_estimators': 186, 'learning_rate': 0.03362720375498702, 'num_leaves': 2000, 'max_depth': 15, 'bagging_fraction': 0.8345727386650562, 'feature_fraction': 0.7467901594808177, 'bagging_freq': 4, 'min_child_samples': 71}. Best is trial 31 with value: 0.9765476572466051.
[LightGBM] [Warning] bagging_fraction is set=0.8238346695394376, subsample=1.0 will be ignored. Current value: bagging_fraction=0.8238346695394376 [LightGBM] [Warning] feature_fraction is set=0.7193578806765899, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.7193578806765899 [LightGBM] [Warning] bagging_freq is set=3, subsample_freq=0 will be ignored. Current value: bagging_freq=3
[I 2024-04-02 19:38:27,489] Trial 49 finished with value: 0.9760261448606858 and parameters: {'n_estimators': 161, 'learning_rate': 0.03650324818461277, 'num_leaves': 2480, 'max_depth': 14, 'bagging_fraction': 0.8238346695394376, 'feature_fraction': 0.7193578806765899, 'bagging_freq': 3, 'min_child_samples': 79}. Best is trial 31 with value: 0.9765476572466051.
Observamos el tiempo que toma el estudio
elapsed_time_df
| Estudio | Tiempo (s) | |
|---|---|---|
| 0 | Star Class - LightGBM Classifier | 555.258102 |
Visualizamos los resultados de cada experimento realizado por el estudio:
%matplotlib inline
from matplotlib import pyplot as plt
fig = optuna.visualization.plot_optimization_history(study)
fig.layout.title.text = 'Histograma de Optimización'
fig.layout.xaxis.title = 'Número de experimento'
fig.layout.yaxis.title = 'Valor de la función objetivo'
# Actualiza la leyenda
for trace in fig.data:
if trace.name == 'Best Value':
trace.name = 'Mejor Valor'
elif trace.name == 'Objective Value':
trace.name = 'Valor Objetivo'
fig.show(renderer="notebook")
plt.show()
Observamos el resultado del mejor intento:
print("Number of finished trials: {}".format(len(study.trials)))
# Obtenemos el mejor modelo conseguido
print("Best trial:")
trial = study.best_trial
print(" Value: {}".format(trial.value))
print(" Params: ")
params = {**param_common, **trial.params}
for key, value in params.items():
print("\t{}: {}".format(key, value))
Number of finished trials: 50 Best trial: Value: 0.9765476572466051 Params: boosting_type: gbdt objective: multiclass class_weight: balanced random_state: 0 n_estimators: 153 learning_rate: 0.034696496881949423 num_leaves: 2400 max_depth: 14 bagging_fraction: 0.848786709125654 feature_fraction: 0.7303621477118888 bagging_freq: 4 min_child_samples: 75
Primero genero ambos modelos (el default y el optimizer):
from sklearn.model_selection import train_test_split
modelos = {'Default': lgbm.LGBMClassifier(class_weight='balanced'),
'Optimizer': lgbm.LGBMClassifier(**params)}
tiempos_entrenamiento = {}
# Ajustamos los modelos y medimos el tiempo
for nombre, modelo in modelos.items():
print(f'CREANDO MODELO: {nombre}...')
inicio_entrenamiento = time.time()
modelo.fit(X_train, y_train)
fin_entrenamiento = time.time()
tiempos_entrenamiento[nombre] = fin_entrenamiento - inicio_entrenamiento
print("\nModelos generados correctamente!")
CREANDO MODELO: Default... CREANDO MODELO: Optimizer... Modelos generados correctamente!
# Convertimos los tiempos de entrenamiento a un DataFrame para una visualización en forma de tabla
df_tiempos = pd.DataFrame(list(tiempos_entrenamiento.items()), columns=['Modelo', 'Tiempo de entrenamiento (s)'])
# Mostramos la tabla
df_tiempos
| Modelo | Tiempo de entrenamiento (s) | |
|---|---|---|
| 0 | Default | 0.624040 |
| 1 | Optimizer | 8.466569 |
Posteriormente comparo las métricas de evaluación de ambos modelos:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import pandas as pd
evaluacion = []
for k, v in modelos.items():
print(f'EVALUANDO MODELO: {k}...')
y_pred_train = v.predict(X_train)
y_pred_test = v.predict(X_test)
y_prob_train = v.predict_proba(X_train)
y_prob_test = v.predict_proba(X_test)
if y_prob_train.shape[1] == 2:
y_prob_train = y_prob_train[:, 1]
y_prob_test = y_prob_test[:, 1]
evaluacion.extend([
{'Modelo': f'{k} - Train', 'Accuracy': accuracy_score(y_train, y_pred_train),
'Precision': precision_score(y_train, y_pred_train, average='macro'),
'Recall': recall_score(y_train, y_pred_train, average='macro'),
'F1': f1_score(y_train, y_pred_train, average='macro')},
{'Modelo': f'{k} - Test', 'Accuracy': accuracy_score(y_test, y_pred_test),
'Precision': precision_score(y_test, y_pred_test, average='macro'),
'Recall': recall_score(y_test, y_pred_test, average='macro'),
'F1': f1_score(y_test, y_pred_test, average='macro')},
])
df_evaluacion = pd.DataFrame(evaluacion)
df_evaluacion.set_index('Modelo', inplace=True)
df_evaluacion
EVALUANDO MODELO: Default... EVALUANDO MODELO: Optimizer...
| Accuracy | Precision | Recall | F1 | |
|---|---|---|---|---|
| Modelo | ||||
| Default - Train | 0.98315 | 0.978603 | 0.982576 | 0.980561 |
| Default - Test | 0.97605 | 0.970451 | 0.974983 | 0.972686 |
| Optimizer - Train | 0.98925 | 0.984926 | 0.990540 | 0.987689 |
| Optimizer - Test | 0.97810 | 0.973430 | 0.976548 | 0.974976 |
Por último, genero las matrices de confusión de ambos modelos y las imprimo:
import itertools
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
%matplotlib inline
# Obtenemos las Matrices de confusión
msc = list()
for k, v in modelos.items():
print('Obteniendo Matriz de Confusión del modelo: {model}...'.format(model=k))
model = {}
model['name'] = k
y_pred_train = v.predict(X_train)
y_pred_test = v.predict(X_test)
cm_train = confusion_matrix(y_true=y_train, y_pred=y_pred_train)
cm_test = confusion_matrix(y_true=y_test, y_pred=y_pred_test)
model['confusion_matrix_train'] = cm_train
model['confusion_matrix_test'] = cm_test
msc.append(model)
print('\nMatrices de confusión generadas correctamente:')
# Modificamos la función para mostrar valores absolutos y ajustar el color según el modelo
def plot_confusion_matrix(cm, classes, title, cmap):
"""
Esta función imprime y dibuja la matriz de confusión.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], 'd'), horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('Etiqueta real')
plt.xlabel('Etiqueta predicha')
# Pintamos las matrices de confusión con números absolutos y colores específicos
plt.figure(figsize=(15, 8))
cmap_by_model = {'Default': plt.cm.Reds, 'Optimizer': plt.cm.YlGn}
for i, mc in enumerate(msc):
cmap = cmap_by_model[mc['name']]
plt.subplot(2, 2, i*2 + 1)
plot_confusion_matrix(mc['confusion_matrix_train'], classes=['0', '1', '2'],
title='{}\nMatriz de Confusión - Entrenamiento (Train)'.format(mc['name']), cmap=cmap)
plt.subplot(2, 2, i*2 + 2)
plot_confusion_matrix(mc['confusion_matrix_test'], classes=['0', '1', '2'],
title='{}\nMatriz de Confusión - Prueba (Test)'.format(mc['name']), cmap=cmap)
plt.show()
Obteniendo Matriz de Confusión del modelo: Default... Obteniendo Matriz de Confusión del modelo: Optimizer... Matrices de confusión generadas correctamente:
/tmp/ipykernel_2833/3617826108.py:52: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
Genero las matrices de confusión de ambos modelos, tratando con sus porcentajes, y las imprimo:
import itertools
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
%matplotlib inline
# Obtenemos las Matrices de confusión
msc = list()
for k, v in modelos.items():
print('Obteniendo Matriz de Confusión del modelo: {model}...'.format(model=k))
model = {}
model['name'] = k
y_pred_train = v.predict(X_train)
y_pred_test = v.predict(X_test)
cm_train = confusion_matrix(y_true=y_train, y_pred=y_pred_train)
cm_test = confusion_matrix(y_true=y_test, y_pred=y_pred_test)
# Convertimos a porcentajes
model['confusion_matrix_train'] = (cm_train.astype('float') / cm_train.sum(axis=1)[:, np.newaxis]) * 100
model['confusion_matrix_test'] = (cm_test.astype('float') / cm_test.sum(axis=1)[:, np.newaxis]) * 100
msc.append(model)
print('\nMatrices de confusión generadas correctamente:')
# Ajustamos la función para usar diferentes mapas de colores según el modelo y mostrar porcentajes
def plot_confusion_matrix(cm, classes, title, cmap):
"""
Esta función imprime y dibuja la matriz de confusión.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, '{:.1f}%'.format(cm[i, j]), horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('Etiqueta real')
plt.xlabel('Etiqueta predicha')
# Seleccionamos los mapas de colores
cmap_by_model = {'Default': plt.cm.Reds, 'Optimizer': plt.cm.YlGn}
# Pintamos las matrices de confusión con porcentajes y colores específicos
plt.figure(figsize=(15, 8))
for i, mc in enumerate(msc):
cmap = cmap_by_model.get(mc['name'], plt.cm.Purples)
plt.subplot(2, len(modelos), i*2 + 1)
plot_confusion_matrix(mc['confusion_matrix_train'], classes=['0', '1', '2'],
title='{}\nMatriz de Confusión - Entrenamiento (Train)'.format(mc['name']), cmap=cmap)
plt.subplot(2, len(modelos), i*2 + 2)
plot_confusion_matrix(mc['confusion_matrix_test'], classes=['0', '1', '2'],
title='{}\nMatriz de Confusión - Prueba (Test)'.format(mc['name']), cmap=cmap)
plt.tight_layout()
plt.show()
Obteniendo Matriz de Confusión del modelo: Default... Obteniendo Matriz de Confusión del modelo: Optimizer... Matrices de confusión generadas correctamente:
/tmp/ipykernel_2833/4102839127.py:55: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed.
Se puede observar de manera visual la mejora en los resultados, tanto en el conjuntos de datos de entrenamiento como en el de prueba.
El algoritmo LightGBM optimizado con Optuna proporciona un modelo de alta eficacia para clasificación multiclase. La elección de los hiperparámetros fue guiada por un estudio que se centró en optimizar la métrica de "recall", la cual es altamente apropiada en este caso, dado que la clasificación incorrecta de las clases puede ser más perjudicial que la precisión de la clasificación correcta. En otras palabras, es preferible minimizar los falsos negativos.
El modelo optimizado superó al modelo de LightGBM con hiperparámetros por defecto en todas las métricas de evaluación, demostrando la eficacia del proceso de optimización. Por ejemplo, la precisión en el conjunto de pruebas del modelo optimizado fue de 0.9786 frente a 0.9704 en el modelo por defecto, y el recall en el conjunto de pruebas fue de 0.9825 frente a 0.9749, lo que confirma la superioridad del modelo optimizado, aunque no sea una mejora aplastante, ya que ambos modelos son bastante buenos.
La configuración de los hiperparámetros en el modelo optimizado muestra una combinación equilibrada entre la complejidad del modelo y el control del sobreajuste. En cuanto a los resultados del mejor intento del estudio realizado:
El "learning_rate" relativamente bajo (0.0347) y el alto número de estimadores (153) permiten que el modelo aprenda lentamente y de manera más precisa.
Al mismo tiempo, la "bagging_fraction" (0.8487) y la "feature_fraction" (0.7303) ayudan a regularizar el modelo al entrenarlo solo en una fracción del conjunto de datos y características, respectivamente.
La profundidad máxima (14) y el número de hojas (2400) se ajustaron de manera óptima para permitir suficiente complejidad del modelo, pero sin caer en el sobreajuste.
El número de "min_child_samples" (75) también contribuye a evitar el sobreajuste, obligando a las hojas a tener al menos ese número de muestras.
En resumen, este proceso de optimización ha demostrado la importancia de una cuidadosa selección y ajuste de los hiperparámetros para obtener un rendimiento superior en la clasificación multiclase, logrando encontrar una configuración de hiperparámetros que maximiza el recall, lo que resulta en un modelo robusto y efectivo.